Image processing device, ranging device and method

ABSTRACT

According to one embodiment, an image processing device includes storage and a processor. The storage stores a statistical model generated by learning bokeh produced in a first image affected by aberration of an optical system, the bokeh changing nonlinearly in accordance with a distance to a subject in the first image. The processor obtains a second image affected by the aberration of the optical system. The processor inputs the second image to the statistical model and obtains distance information indicating a distance to a subject in the second image.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefit of priority fromJapanese Patent Application No. 2019-043814, filed Mar. 11, 2019, theentire contents of which are incorporated herein by reference.

FIELD

Embodiments described herein relate generally to an image processingdevice, a ranging device and a method.

BACKGROUND

In general, to obtain the distance to a subject, the use of imagescaptured by two capture devices (cameras) or a stereo camera (compoundeye camera) has been known. In recent years, a technology for obtainingthe distance to a subject using an image captured by a single capturedevice (monocular camera) has been developed.

However, when a distance is obtained from an image captured by a singlecapture device, it is difficult to realize a high robustness.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example of the configuration of a ranging systemincluding an image processing device according to an embodiment.

FIG. 2 shows an example of the system configuration of the imageprocessing device.

FIG. 3 is shown for explaining the outline of the operation of theranging system.

FIG. 4 shows the relationship between the distance to a subject and thebokeh produced in an image by chromatic aberration when a single lens isused.

FIG. 5 shows the relationship between the distance to a subject and thebokeh produced in an image by chromatic aberration when an achromaticlens is used.

FIG. 6 shows the relationship between the size of the aperture of thediaphragm mechanism provided in a capture device and a PSF shape.

FIG. 7 shows an example of the PSF shape produced in the image of eachchannel.

FIG. 8 shows another example of the PSF shape produced in the image ofeach channel.

FIG. 9 shows the relationship between the non-linearity of the PSF shapeand the shape of the aperture of the diaphragm mechanism.

FIG. 10 shows the outline of the operation for obtaining distanceinformation.

FIG. 11 is shown for explaining a first method for estimating a distancefrom a captured image.

FIG. 12 shows an example of information input to a statistical model inthe first method.

FIG. 13 is shown for explaining a second method for estimating adistance from a captured image.

FIG. 14 shows an example of information input to a statistical model inthe second method.

FIG. 15 is shown for explaining a third method for estimating a distancefrom a captured image.

FIG. 16 shows an example of the learning method of a statistical model.

FIG. 17 is shown for specifically explaining the distance to a subjectestimated from an image.

FIG. 18 is a flowchart showing an example of the processing procedurefor generating a statistical model.

FIG. 19 is a flowchart showing an example of the processing procedure ofthe image processing device when distance information is obtained from acaptured image.

FIG. 20 is shown for explaining the outline of a modification of thepresent embodiment.

FIG. 21 shows an example of the learning method of a statistical model.

FIG. 22 is a flowchart showing an example of the processing procedure ofthe image processing device when distance information is obtained from acaptured image.

FIG. 23 shows an example of the functional configuration of a mobileobject including a ranging device.

FIG. 24 is shown for explaining a case where the mobile object is anautomobile.

FIG. 25 is shown for explaining a case where the mobile object is adrone.

FIG. 26 is shown for explaining a case where the mobile object is anautonomous mobile robot.

FIG. 27 is shown for explaining a case where the mobile object is arobotic arm.

DETAILED DESCRIPTION

Various embodiments will be described hereinafter with reference to theaccompanying drawings.

In general, according to one embodiment, an image processing deviceincludes storage and a processor. The storage stores a statistical modelgenerated by learning bokeh produced in a first image affected byaberration of an optical system, the bokeh changing nonlinearly inaccordance with a distance to a subject in the first image. Theprocessor obtains a second image affected by the aberration of theoptical system. The processor inputs the second image to the statisticalmodel and obtains distance information indicating a distance to asubject in the second image.

FIG. 1 shows an example of the configuration of a ranging systemincluding an image processing device according to the presentembodiment. The ranging system 1 shown in FIG. 1 is used to capture animage and obtain (measure) the distance from the capture point to thesubject using the captured image.

As shown in FIG. 1, the ranging system 1 includes a capture device 2 andan image processing device 3. In the present embodiment, the rangingsystem 1 includes the capture device 2 and the image processing device 3as separate devices. However, the ranging system 1 may be realized as asingle device (ranging device) in which the capture device 2 functionsas a capture unit and the image processing device 3 functions as animage processor. The image processing device 3 may operate as, forexample, a server which performs various kinds of cloud computingservices.

The capture device 2 is used to capture various types of images. Thecapture device 2 includes a lens 21 and an image sensor 22. The lens 21and the image sensor 22 are equivalent to the optical system (monocularcamera) of the capture device 2. The optical system of the capturedevice 2 further includes, for example, a diaphragm mechanism (notshown) including an aperture for adjusting the amount of light taken inthe optical system of the capture device 2 (in other words, the amountof entering light).

The light reflected on the subject enters the lens 21. The light whichentered the lens 21 passes through the lens 21. The light which passedthrough the lens 21 reaches the image sensor 22 and is received(detected) by the image sensor 22. The image sensor 22 generates animage consisting of a plurality of pixels by converting the receivedlight into electric signals (photoelectric conversion).

The image sensor 22 is realized by, for example, a charge coupled device(CCD) image sensor or a complementary metal oxide semiconductor (CMOS)image sensor. The image sensor 22 includes, for example, a first sensor(R sensor) 221 which detects light having a red (R) wavelength band, asecond sensor (G sensor) 222 which detects light having a green (G)wavelength band and a third sensor (B sensor) 223 which detects lighthaving a blue (B) wavelength band. The image sensor 22 is capable ofreceiving light having corresponding wavelength bands by the first tothird sensors 221 to 223 and generating sensor images (an R image, a Gimage and a B image) corresponding to the wavelength bands (colorcomponents). The image captured by the capture device 2 is a color image(RGB image) and includes an R image, a G image and a B image.

In the present embodiment, the image sensor 22 includes the first tothird sensors 221 to 223. However, the image sensor 22 may be configuredto include at least one of the first to third sensors 221 to 223. Theimage sensor 22 may include, for example, a sensor for generating amonochromatic image instead of the first to third sensors 221 to 223.

In the present embodiment, an image generated based on the light whichpassed through the lens 21 is an image affected by the aberration of theoptical system (lens 21), and includes bokeh (defocus blur) produced bythe aberration. The details of the bokeh produced in an image aredescribed later.

The image processing device 3 shown in FIG. 1 includes, as functionalstructures, a statistical model storage 31, an image acquisition module32, a distance acquisition module 33 and an output module 34.

In the statistical model storage 31, a statistical model used to obtainthe distance to a subject from an image captured by the capture device 2is stored. The statistical model stored in the statistical model storage31 is generated by learning the bokeh produced in an image (first image)affected by the above aberration of the optical system and changingnonlinearly in accordance with the distance to the subject in the image.It should be noted that the statistical model may be generated byapplying various types of known machine learning algorithm such as aneural network or random forests. The neural network applicable in thepresent embodiment may include, for example, a convolutional neuralnetwork (CNN), a totally-coupled neural network and a recursive neuralnetwork.

The image acquisition module 32 obtains an image (second image) capturedby the capture device 2 from the capture device 2 (image sensor 22).

The distance acquisition module 33 uses an image obtained by the imageacquisition module 32 and obtains distance information indicating thedistance to the subject in the image. In this case, the distanceacquisition module 33 inputs an image to the statistical model stored inthe statistical model storage 31 to obtain distance informationindicating the distance to the subject in the image.

For example, the output module 34 outputs the distance informationobtained by the distance acquisition module 33 in a map form in whichthe distance information is positionally associated with an image. Inthis case, the output module 34 is capable of outputting image dataconsisting of pixels in which the distance indicated by distanceinformation is a pixel value (in other words, the output module 34 iscapable of outputting distance information as image data). When distanceinformation is output as image data in this manner, for example, theimage data can be displayed as a distance image indicating the distanceby colors. For example, the distance information output by the outputmodule 34 may be used to calculate the size of a subject in an imagecaptured by the capture device 2.

FIG. 2 shows an example of the system configuration of the imageprocessing device 3 shown in FIG. 1. The image processing device 3includes a CPU 301, a nonvolatile memory 302, a RAM 303 and acommunication device 304. The image processing device 3 further includesa bus 305 mutually connecting the CPU 301, the nonvolatile memory 302,the RAM 303 and the communication device 304.

The CPU 301 is a processor to control the operation of variouscomponents of the image processing device 3. The CPU 301 may be a singleprocessor or include a plurality of processors. The CPU 301 executesvarious programs loaded from the nonvolatile memory 302 into the RAM303. These programs include an operating system (OS) and variousapplication programs. The application programs include an imageprocessing program 303A to use an image captured by the capture device 2and obtain the distance to a subject in the image from the capturedevice 2.

The nonvolatile memory 302 is a storage medium used as an auxiliarystorage device. The RAM 303 is a storage medium used as a main storagedevice. FIG. 2 shows only the nonvolatile memory 302 and the RAM 303.However, the image processing device 3 may include another storagedevice such as a hard disk drive (HDD) or a solid stage drive (SDD).

In the present embodiment, the statistical model storage 31 shown inFIG. 1 is realized by, for example, the nonvolatile memory 302 oranother storage device.

In the present embodiment, some or all of the image acquisition module32, the distance acquisition module 33 and the output module 34 arerealized by causing the CPU 301 (in other words, the computer of theimage processing device 3) to execute the image processing program 303A,in other words, by software. The image processing program 303A may bestored in a computer-readable storage medium and distributed, or may bedownloaded into the image processing device 3 through a network. Itshould be noted that some or all of the modules 32 to 34 may be realizedby hardware such as an integrated circuit (IC) or a combination ofsoftware and hardware.

The communication device 304 is a device configured to perform wiredcommunication or wireless communication. The communication device 304includes a transmitter which transmits a signal and a receiver whichreceives a signal. For example, the communication device 304communicates with an external device via a network and communicates withan external device present around the communication device 304. Theexternal device includes the capture device 2. In this case, the imageprocessing device 3 receives an image from the capture device 2 via thecommunication device 304.

Although omitted in FIG. 2, the image processing device 3 may furtherinclude an input device such as a mouse or keyboard, and a displaydevice such as a display.

Now, this specification explains the outline of the operation of theranging system 1 of the present embodiment with reference to FIG. 3.

In the ranging system 1, the capture device 2 (image sensor 22)generates an image affected by the aberration of the optical system(lens 21) as described above.

The image processing device 3 (image acquisition module 32) obtains animage generated by the capture device 2 and inputs the image to thestatistical model stored in the statistical model storage 31.

The image processing device 3 (distance acquisition module 33) uses thestatistical model and obtains distance information indicating thedistance to a subject in the image input to the statistical model.

In this way, in the present embodiment, distance information can beobtained from an image captured by the capture device 2 using astatistical model.

In the present embodiment, an image captured by the capture device 2includes bokeh caused by the aberration of the optical system of thecapture device 2 (lens aberration) as stated above.

The bokeh produced in an image is explained below. In the presentembodiment, this specification mainly explains chromatic aberrationregarding the bokeh caused by the aberration of the optical system ofthe capture device 2.

FIG. 4 shows the relationship between the distance to a subject and thebokeh produced in an image by chromatic aberration. In the followingexplanation, a position in focus in the capture device 2 is referred toas a focus position.

The refractive index of light when light passes through the lens 21having aberration differs depending on the wavelength band. Thus, forexample, when the position of the subject is out of the focus position,light of various wavelength bands is not concentrated at one point andreaches different points. This emerges as chromatic aberration (bokeh)on an image.

The upper stage of FIG. 4 shows a case where the position of the subjectis more distant from the capture device 2 (image sensor 22) than thefocus position (in other words, the position of the subject is on therear side of the focus position).

In this case, regarding light 401 having a red wavelength band, an imageincluding comparatively small bokeh b_(R) is generated in the imagesensor 22 (first sensor 221). Regarding light 402 having a bluewavelength band, an image including comparatively large bokeh b_(B) isgenerated in the image sensor 22 (third sensor 223). Regarding light 403having a green wavelength band, an image including bokeh whose size isintermediate between bokeh b_(R) and bokeh b_(B) is generated. Thus, inan image captured in a state in which the position of the subject ismore distant from the capture device 2 than the focus position, bluebokeh is observed on the external side of the subject in the image.

The lower stage of FIG. 4 shows a case where the position of the subjectis closer to the capture device 2 (image sensor 22) than the focusposition (in other words, the position of the subject is on the capturedevice 2 side with respect to the focus position).

In this case, regarding light 401 having a red wavelength band, an imageincluding comparatively large bokeh b_(R) is generated in the imagesensor 22 (first sensor 221). Regarding light 402 having a bluewavelength band, an image including comparatively small bokeh b_(B) isgenerated in the image sensor 22 (third sensor 223). Regarding light 403having a green wavelength band, an image including bokeh whose size isintermediate between bokeh b_(R) and bokeh b_(B) is generated. Thus, inan image captured in a state in which the position of the subject iscloser to the capture device 2 than the focus position, red bokeh isobserved on the external side of the subject in the image.

In the example of FIG. 4, the lens 21 is simply a single lens. However,in general, for example, a lens to which chromatic aberration correctionhas been applied (hereinafter, referred to as an achromatic lens) may beused in the capture device 2. The achromatic lens is a combination of aconvex lens element with low dispersion and a concave lens element withhigh dispersion, and includes the fewest lens elements as a lens whichcorrects chromatic aberration.

FIG. 5 shows the relationship between the distance to a subject and thebokeh produced in an image by chromatic aberration when the aboveachromatic lens is used for the lens 21. The achromatic lens is designedto bring a blue wavelength and a red wavelength into focus. However,chromatic aberration cannot be completely eliminated. Thus, when theposition of the subject is more distant from the capture device 2 thanthe focus position, green bokeh is produced. When the position of thesubject is closer to the capture device 2 than the focus position,purple bokeh is produced.

The middle stage of FIG. 4 shows a case where the position of thesubject is matched with the focus position with respect to the capturedevice 2 (image sensor 22). In this case, an image having less bokeh isgenerated in the image sensor 22 (first to third sensors 221 to 223).

As described above, the diaphragm mechanism is provided in the opticalsystem of the capture device 2. The shape of the bokeh produced in animage captured by the capture device 2 differs depending on the size ofthe aperture of the diaphragm mechanism. The shape of bokeh is referredto as a point spread function (PSF) shape, and indicates the diffusiondistribution of light generated when a point source is captured.

FIG. 6 shows the relationship between the size of the aperture of thediaphragm mechanism provided in the capture device 2 and the PSF shape.

The upper stage of FIG. 6 shows the PSF shapes produced in the imagescaptured by the capture device 2 having a focus position of 1500 mm, acamera lens focal length of 50 mm, and an F-number (diaphragm) of F1.8.The PSF shapes are arranged from the left to right side as the distancefrom the subject to the capture device 2 increases. The lower stage ofFIG. 6 shows the PSF shapes produced in the images captured by thecapture device 2 having a focus position of 1500 mm, a camera lens focallength of 50 mm and an F-number (diaphragm) of F4. The PSF shapes arearranged from the left to right side as the distance from the subject tothe capture device 2 increases. In FIG. 6, the middle shapes of theupper and lower stages indicate the PSF shapes in which the position ofthe subject is matched with the focus position.

The F-number is a numerical conversion of the amount of light taken intothe capture device 2 (optical system). The amount of light taken intothe capture device 2 increases (in other words, the aperture is larger)with increasing F-number.

In FIG. 6, the PSF shapes shown at corresponding positions in the upperand lower stages are the PSF shapes produced when the position of thesubject with respect to the capture device 2 is the same as each other.However, even when the position of the subject is the same as eachother, the PSF shape of the upper stage (the PSF shape produced in theimage captured by the capture device 2 having an F-number of F1.8) isdifferent from the PSF shape of the lower stage (the PSF shape producedin the image captured by the capture device 2 having an F-number of F4).

Moreover, as shown in the leftmost PSF shapes and the rightmost PSFshapes in FIG. 6, for example, even if the distance from the position ofthe subject to the focus position is substantially the same as eachother, the PSF shape differs between when the position of the subject iscloser to the capture device 2 than the focus position and when theposition of the subject is more distant from the capture device 2 thanthe focus position.

The phenomenon in which the PSF shape differs depending on the size ofthe aperture of the diaphragm mechanism and the position of the subjectwith respect to the capture device 2 as described above is also causedin each channel (an RGB image, an R image, a G image and a B image).FIG. 7 shows the PSF shapes produced in the images of the channelscaptured by the capture device 2 having a focus position of 1500 mm, acamera lens focal length of 50 mm and an F-number of F1.8. In FIG. 7,the PSF shapes are separated based on whether the position of thesubject is closer to the capture device 2 than the focus position (onthe capture device 2 side) or more distant from the capture device 2than the focus position (on the rear side). FIG. 8 shows the PSF shapesproduced in the images of the channels captured by the capture device 2having a focus position of 1500 mm, a camera lens focal length of 50 mmand an F-number of F4. In FIG. 8, the PSF shapes are separated based onwhether the position of the subject is closer to the capture device 2than the focus position or more distant from the capture device 2 thanthe focus position.

The image processing device 3 (ranging system 1) of the presentembodiment obtains the distance to a subject from an image, using thestatistical model generated in consideration of bokeh (the color, sizeand shape) changing nonlinearly in accordance with the distance to thesubject (in other words, the position of the subject with respect to thecapture device 2) in the image as described above. In the presentembodiment, the bokeh changing nonlinearly includes, for example, thebokeh produced by the chromatic aberration of the optical system of thecapture device 2 as explained above in FIG. 4 and FIG. 5 and the bokehproduced based on the size of the aperture of the diaphragm mechanismfor adjusting the amount of light taken into the optical system of thecapture device 2 as explained in FIG. 6 to FIG. 8.

Furthermore, the PSF shape differs depending on the shape of theaperture of the diaphragm mechanism. FIG. 9 shows the relationshipbetween the non-linearity (asymmetry property) of the PSF shape and theshape of the aperture of the diaphragm mechanism. The abovenon-linearity of the PSF shape is easily caused when the shape of theaperture of the diaphragm mechanism is other than a circle. Inparticular, the non-linearity of the PSF shape is more easily causedwhen the shape of the aperture is an odd-gon, or an even-gon providedasymmetrically with respect to the horizontality or perpendicular lineof the image sensor.

In the present embodiment, when the focus position of the capture device2 is fixed, the light which passed through the lens 21 has a respondenceshape of a point spread function (PSF) which changes depending on thedistance to a subject. An image is generated by detecting this light bythe image sensor 22.

FIG. 10 shows the outline of the operation for obtaining distanceinformation in the present embodiment. In the following explanation, animage captured by the capture device 2 to obtain distance information(the distance to a subject) is referred to as a captured image.

The bokeh (bokeh information) 502 produced in the captured image 501shown in FIG. 10 is a physical clue to the distance to a subject 503.Specifically, the color of the bokeh, and the size and shape regardingthe PSF are clues to the distance to the subject 503.

The image processing device 3 (distance acquisition module 33) of thepresent embodiment estimates the distance 504 to the subject 503 byanalyzing the bokeh 502 produced in the captured image 501 as a physicalclue with a statistical model.

Now, this specification explains an example of a method for estimatingthe distance based on a captured image using a statistical model. Here,first to third methods are explained.

The first method is explained with reference to FIG. 11. In the firstmethod, the distance acquisition module 33 extracts a local area (imagepatch) 501 a from the captured image 501.

In this case, for example, the entire area of the captured image 501 maybe divided into a matrix, and the partial areas after the division maybe extracted in series as the local areas 501 a. Alternatively, thecaptured image 501 may be recognized, and the local areas 501 a may beextracted to cover the area in which the subject (image) is detected. Alocal area 501 a may partially overlap with another local area 501 a.

The distance acquisition module 33 inputs information related to eachextracted local area 501 a (information of the captured image 501) to astatistical model. In this way, the distance acquisition module 33estimates the distance 504 to the subject in each local area 501 a.

Thus, the statistical model to which information related to each localarea 501 a is input estimates the distance for each of the pixelsincluded in the local area 501 a.

For example, when a particular pixel belongs to both a first local area501 a and a second local area 501 a (in other words, when the firstlocal area 501 a overlaps with the second local area 501 a with regardto the area including the pixel), the distance estimated by consideringthat the pixel belongs to the first local area 501 a may be differentfrom the distance estimated by considering that the pixel belongs to thesecond local area 501 a.

Thus, for example, as described above, when a plurality of local areas501 a which partially overlap with each other are extracted, thedistance of the pixels included in the area in which the local areas 501a overlap with each other may be the mean value of the distanceestimated with regard to a part (pixel) of one of the overlapping localareas 501 a and the distance estimated with regard to a part (pixel) ofthe other local area 501 a. When three or more local areas 501 a whichpartially overlap with each other are extracted, the distance of thepixels included in the area in which the three or more local areas 501 aoverlap with each other may be determined by the rule of majority by thedistances estimated for the parts of the three or more overlapping localareas 501 a.

FIG. 12 shows an example of information related to each local area 501 ainput to the statistical model in the above first method.

The distance acquisition module 33 generates the gradient data of eachlocal area 501 a extracted from the captured image 501 with regard toeach of the R image, G image and B image included in the captured image501 (specifically, the gradient data of an R image, the gradient data ofa G image and the gradient data of a B image). The gradient datagenerated by the distance acquisition module 33 in this manner is inputto the statistical model.

The gradient data indicates the difference (difference value) of thepixel value between each pixel and its adjacent pixel. For example, wheneach local area 501 a is extracted as a rectangular area of n pixels(X-axis direction)×m pixels (Y-axis direction), gradient data in whichthe difference values calculated with respect to the respective pixelsincluded in the local area 501 a from, for example, the respectiveadjacent pixels on the right are arranged in a matrix shape of n rows×mcolumns is generated.

The statistical model uses the gradient data of an R image, the gradientdata of a G image and the gradient data of a B image and estimates thedistance based on the bokeh produced in each image. FIG. 12 shows a casewhere the gradient data of each of an R image, a G image and a B imageis input to the statistical model. However, the gradient data of thecaptured image 501 (RGB image) may be input to the statistical model.

Now, this specification explains the second method with reference toFIG. 13. In the second method, as information related to each local area501 a in the first method, the gradient data of the local area (imagepatch) 501 a and the location information of the local area 501 a in thecaptured image 501 are input to the statistical model.

For example, the location information 501 b may indicate the centerpoint of the local area 501 a, or indicate a predetermined side such asan upper left side. As the location information 501 b, the locationinformation of each of the pixels included in the local area (imagepatch) 501 a on the captured image 501 may be used.

By further inputting the location information 501 b to the statisticalmodel as described above, for example, when the bokeh of the subjectimage formed by the light passing through the middle portion of the lens21 is different from the bokeh of the subject image formed by the lightpassing through the end portion of the lens 21, the effect caused by thedifference to the estimation of the distance can be eliminated.

Thus, in the second method, the distance can be more assuredly estimatedfrom the captured image 501 based on the correlation of bokeh, thedistance and the position on the image.

FIG. 14 shows an example of information related to each local area 501 aand input to the statistical model in the above second method.

For example, when a rectangular area of n pixels (X-axis direction)×mpixels (Y-axis direction) is extracted as a local area 501 a, the imageacquisition module 32 obtains a X-coordinate value (X-coordinate data)on the captured image 501 corresponding to, for example, the centerpoint of the local area 501 a and a Y-coordinate value (Y-coordinatedata) on the captured image 501 corresponding to, for example, thecenter point of the local area 501 a. This data is input to the distanceacquisition module 33.

In the second method, the X-coordinate data and the Y-coordinate datagenerated by the distance acquisition module 33 in this manner are inputto the statistical model together with the above gradient data of an Rimage, a G image and a B image.

The third method is further explained with reference to FIG. 15. Thethird method is different from the above first and second methods, andin the third method, the local areas (images patches) 501 a are notextracted from the captured image 501. In the third method, the distanceacquisition module 33 inputs information related to the entire area ofthe captured image 501 (the gradient data of an R image, a G image and aB image) to the statistical model.

In comparison with the first and second methods which estimate thedistance 504 for each local area 501 a, in the third method, theuncertainty of estimation by the statistical model might be increased.However, the load on the distance acquisition module 33 can be reduced.

In the following explanation, the information input to the statisticalmodel in the above first to third methods is referred to as informationrelated to an image for convenience sake.

FIG. 16 shows an example of the learning method of the statistical modelin the present embodiment. Here, the learning of the statistical modelusing an image captured by the capture device 2 is explained. However,the learning of the statistical model may be performed using an imagecaptured by, for example, another device (a camera, etc.,) including anoptical system similar to the optical system of the capture device 2.

In the above explanation, an image captured by the capture device 2 toobtain distance information is referred to as a captured image. In thepresent embodiment, an image used by the statistical model to learnbokeh changing nonlinearly in accordance with the distance is referredto as a learning image for convenience sake.

When any of the first method explained with reference to FIG. 11, thesecond method explained with reference to FIG. 13 and the third methodexplained with reference to FIG. 15 is used, the learning of thestatistical model is performed basically by inputting informationrelated to a learning image 601 to the statistical model and feedingback the difference between the distance (distance information) 602estimated by the statistical model and the correct value 603 to thestatistical model. Feeding back refers to updating the parameter (forexample, the weight coefficient) of the statistical model to decreasethe difference.

When the first method is adopted as the above method for estimating thedistance based on a captured image, in the learning of the statisticalmodel, similarly, information related to each of the local areas (imagepatches) extracted from the learning image 601 (gradient data) is inputto the statistical model. The distance 602 of each pixel in each localarea is estimated by the statistical model. The difference obtained bycomparing the estimated distance 602 with the correct value 603 is fedback to the statistical model.

When the second method is adopted as the above method for estimating thedistance based on a captured image, in the learning of the statisticalmodel, similarly, gradient data and location information are input tothe statistical model as information related to each of the local areas(image patches) extracted from the learning image 601. The distance 602of each pixel in each local area is estimated by the statistical model.The difference obtained by comparing the estimated distance 602 with thecorrect value 603 is fed back to the statistical model.

When the third method is adopted as the method for estimating thedistance based on a captured image, in the learning of the statisticalmodel, similarly, information related to the entire area of the learningimage 601 (gradient data) is input to the statistical model in block.The distance 602 of each pixel in the learning image 601 is estimated bythe statistical model. The difference obtained by comparing theestimated distance 602 with the correct value 603 is fed back to thestatistical model.

The statistical model of the present embodiment is generated (prepared)by repeating leaning using images captured while the distance from thecapture device 2 to a subject is changed in a state where the focusposition (focal length) is fixed. When the learning of a focus positionis completed, another focus position is learned in a similar manner. Inthis way, a statistical model with higher accuracy can be generated.This statistical model is stored in the statistical model storage 31included in the image processing device 3, and is used to obtaindistance information from a captured image.

With reference to FIG. 17, this specification explains the details ofthe distance to a subject estimated from an image (a captured image or alearning image).

In FIG. 17, the size of the bokeh produced when the subject is closer tothe capture device 2 than the focus position (on the capture device 2side) is indicated by a negative number on the X-axis. The size of thebokeh produced when the subject is more distant from the capture device2 than the focus position (on the rear side) is indicated by a positivenumber on the X-axis. Thus, in FIG. 17, the color and size of bokeh areindicated by negative and positive numbers.

FIG. 17 shows that the absolute value of the size (pixel) of bokeh isgreater as the subject is more distant from the focus position both whenthe position of the subject is closer to the capture device 2 than thefocus position and when the position of the subject is more distant fromthe capture device 2 than the focus position.

The example of FIG. 17 assumes a case where the focus position in theoptical system which captured an image is approximately 1500 mm. In thiscase, for example, bokeh of approximately −4.8 pixels corresponds to adistance of approximately 1000 mm from the optical system. Bokeh of 0pixels corresponds to a distance of 1500 mm from the optical system.Bokeh of approximately 4.8 pixels corresponds to a distance ofapproximately 750 mm from the optical system.

Here, for convenience sake, the size (pixel) of bokeh is indicated onthe X-axis. However, as explained in the above FIG. 6 to FIG. 8, theshape of the bokeh (PSF shape) produced in an image also differsdepending on whether the subject is closer to the capture device 2 thanthe focus position or more distant from the capture device 2 than thefocus position. Thus, the value indicated on the X-axis in FIG. 17 maybe a value of a reflection of the shape of bokeh (PSF shape).

When information related to a learning image is input to a statisticalmodel in the learning of the statistical model, a positive or negativenumber (hereinafter, referred to as a bokeh value) indicating the color,size and shape of bokeh corresponding to the actual distance to thesubject at the time of capturing the learning image is used as a correctvalue. According to the statistical model in which the above learning isperformed, the above bokeh value is output as the distance to thesubject in an image.

For example, as indicated by line segment d1 of FIG. 17, the distance tothe subject correlates with the color and size (and the shape) of bokeh.Therefore, the estimation of the distance is a synonym for theestimation of the color, size and shape of bokeh.

The estimation by a statistical model can be more accurate when thestatistical model estimates the color, size and shape of bokeh than whenthe statistical model directly estimates the distance. For example, wheninformation related to each local area of n pixels (X-axis direction)×mpixels (Y-axis direction) is input to a statistical model, thestatistical model outputs a distance in which the color, size and shapeof bokeh (specifically, bokeh values indicating the color, size andshape of bokeh) estimated for the respective pixels included in thelocal area are arranged in an array of n rows×m columns.

In the learning of a statistical model, a learning image obtained bycapturing a subject at each distance with as fine granularity aspossible from the lower limit (the capture device 2 side) to the upperlimit (the rear side) of the distance which can be obtained (estimated)in the image processing device 3 is prepared. Information related tothese learning images is input to the statistical model. As a correctvalue used in the learning of the statistical model, a bokeh valueindicating the color, size and shape of bokeh corresponding to thedistance to the subject at the time of capturing each of the abovelearning images is used. For the learning of the statistical model,various learning images of different subjects should be preferablyprepared.

Now, with reference to the flowchart of FIG. 18, this specificationexplains the processing procedure for generating the statistical modelused in the image processing device 3 according to the presentembodiment. The process shown in FIG. 18 may be performed in either, forexample, the image processing device 3 or another device.

Information related to a learning image prepared in advance is input toa statistical model (step S1). For example, the learning image isgenerated by the image sensor 22 based on the light which passed throughthe lens 21 provided in the capture device 2, and is affected by theaberration of the optical system (lens 21) of the capture device 2.Specifically, the learning image has bokeh which changes nonlinearly inaccordance with the distance to the subject as explained in the aboveFIG. 4 to FIG. 8.

It is assumed that the image processing device 3 or another deviceperforming the process shown in FIG. 18 knows the information of theoptical system which captures the leaning image (for example, the sizeof the aperture of the diaphragm mechanism). The information correlateswith the bokeh produced in the learning image.

When the above first method is applied as the method for estimating thedistance based on a captured image, as the information related to alearning image, the gradient data of an R image, a G image and a B imageis input to the statistical model for each local area of the learningimage.

When the above second method is applied as the method for estimating thedistance based on a captured image, as the information related to alearning image, the gradient data of an R image, a G image and a B imageand the location information of each local area of the learning image onthe learning image are input to the statistical model for each localarea.

When the above third method is applied as the method for estimating thedistance based on a captured image, as the information related to alearning image, the gradient data of an R image, a G image and a B imagefor the entire area of the learning image is input to the statisticalmodel.

In the present embodiment, this specification explains that the gradientdata of an R image, a G image and a B image is input to the statisticalmodel. However, when the distance is estimated in terms of the shape(SPF shape) of the bokeh produced in a learning image as describedabove, the gradient data of at least one of an R image, a G image and aB image should be input to the statistical model. When the distance isestimated in terms of the color and size of the bokeh produced in alearning image by chromatic aberration, the gradient data of at leasttwo of an R image, a G image and a B image should be input to thestatistical model.

When the information related to a learning image is input to thestatistical model, the distance to the subject is estimated by thestatistical model (step S2). In this case, the bokeh produced in thelearning image is extracted from the learning image by the statisticalmodel. A distance corresponding to the bokeh is estimated.

The distance estimated in step S2 is compared with the correct valueobtained when the learning image is captured (step S3).

The result of comparison (difference) in step S3 is fed back to thestatistical model (step S4). In this way, in the statistical model, theparameter is updated to decrease the difference (in other words, thebokeh produced in the learning image is learned).

By repeating the process shown in FIG. 18 for each learning image, astatistical model which learned bokeh changing nonlinearly in accordancewith the distance to the subject in each learning image is generated.The statistical model generated in this manner is stored in thestatistical model storage 31 included in the image processing device 3.

With reference to the flowchart shown in FIG. 19, this specificationexplains an example of the processing procedure of the image processingdevice 3 when distance information is obtained from a captured image.

The capture device 2 (image sensor 22) captures an image of a subject,and thus, generates the captured image including the subject. Thecaptured image is affected by the aberration of the optical system (lens21) of the capture device 2 as described above.

It is assumed that the image processing device 3 knows the informationof the optical system of the capture device 2 which captures thecaptured image (for example, the size of the aperture of the diaphragmmechanism). The information correlates with the bokeh produced in thecaptured image.

The image acquisition module 32 included in the image processing device3 obtains the captured image from the capture device 2 (step S11).

Subsequently, the distance acquisition module 33 inputs the informationrelated to the captured image obtained in step S11 to the statisticalmodel stored in the statistical model storage 31 (the statistical modellearned in advance by performing the process shown in FIG. 18) (stepS12). The process of step S12 is similar to that of step S1 shown in theabove FIG. 18. Therefore, the detailed explanation thereof is omittedhere.

When the process of step S12 is performed, the distance to the subjectis estimated in the statistical model. The statistical model outputs theestimated distance. The distance to the subject is estimated and outputfor each of the pixels included in the captured image. In this way, thedistance acquisition module 33 obtains the distance informationindicating the distance output from the statistical model (step S13).

After the process of step S13 is performed, for example, the outputmodule 34 outputs the distance information obtained in step S13 in a mapform in which the distance information is positionally associated withthe captured image 501 (step S14). In the present embodiment, thisspecification mainly explains that the distance information is output ina map form. However, the distance information may be output in anotherform.

As described above, in the present embodiment, a statistical modelgenerated by learning bokeh which is produced in a learning image (firstimage) affected by the aberration of the optical system and changesnonlinearly in accordance with the distance to the subject in the imageis stored in the statistical model storage 31 in advance. When acaptured image affected by the aberration of the optical system isobtained, the captured image is input to the statistical model. In thisway, distance information indicating the distance to the subject in thecaptured image is obtained.

In the present embodiment, the bokeh which changes nonlinearly inaccordance with the distance to a subject in an image includes, forexample, at least one of the bokeh produced by the chromatic aberrationof the optical system and the bokeh produced in accordance with the sizeor shape of the aperture of the diaphragm mechanism for adjusting theamount of light taken into the optical system. In the presentembodiment, this specification mainly explains only chromatic aberrationas the aberration of the optical system. However, the statistical modelused in the present embodiment may learn the bokeh produced by anothertype of aberration (in other words, may obtain distance informationbased on the bokeh produced by another type of aberration). In thepresent embodiment, for example, the distance can be estimated by themonochromatic aberration produced in a monochromatic image. However, theaccuracy of estimation of the distance can be improved with a colorimage having chromatic aberration.

In the present embodiment, as the distance to a subject in an imagecorrelates with the bokeh produced in the image, the bokeh (bokehinformation) which changes in accordance with the distance is extractedfrom a captured image and a distance corresponding to the bokeh can beestimated using a statistical model.

In the present embodiment, the distance to a subject in a captured imageis estimated by a statistical model which performs learning (deeplearning), noting bokeh changing nonlinearly in accordance with thedistance to a subject in an image affected by the aberration of theoptical system (lens 21) as described above. Thus, distance informationindicating the estimated distance can be obtained.

For example, the distance may be estimated, using a statistical modelwhich performs leaning with the bokeh information and semanticinformation of the entire image. However, in this case, specific bokehinformation cannot be used. Further, a large amount of learning data isneeded so that the environment can have robustness (in other words, thedistance can be estimated from various captured images with highaccuracy).

In the present embodiment, a statistical model learns only bokehproduced in an image. Therefore, in comparison with the above case wherelearning is performed with bokeh information and semantic information,the robustness at the time of obtaining the distance (distanceinformation) from a captured image can be improved (in other words, ahigh robustness can be realized).

In a structure, a filter may be provided in the aperture of a monocularcamera (in other words, a process is applied to the lens of the camera)to estimate the distance with the camera. However, in this structure,the light transmittance is decreased by the filter, and the color iseasily unbalanced. Further, the cost is high as the number of componentsof the filter, etc., is increased.

In the present embodiment, the light transmittance is not decreased, andthe color is balanced. Further, the cost is not increased.

In the present embodiment, when a statistical model learns bokeh foreach local area extracted from an image, a statistical model which canestimate the distance with high accuracy from a captured image can begenerated. In this case, by inputting information related to each localarea extracted from a captured image to a statistical model, distanceinformation indicating the distance to a subject in each local area canbe obtained.

Information related to a local area includes, for example, informationindicating the difference of the pixel value between each of the pixelsincluded in the local area and its adjacent pixel. However, another typeof information may be used as information related to a local area.

Specifically, as information related to a local area, the locationinformation of the local area in an image may be further input to thestatistical model. In this configuration, distance information withhigher accuracy can be obtained in consideration of the position of thelocal area. The location information is, for example, informationindicating the coordinates of the center point of the local area on acaptured image. However, the local information may be another type ofinformation.

In the above description, this specification explains a case where astatistical model learns bokeh for each local area extracted from animage. However, when a statistical model learns bokeh for the entirearea of a learning image in block, inputs bokeh for the entire area of acaptured image and estimates the distance, the calculation load on theimage processing device 3 (distance acquisition module 33), etc., can bereduced.

In the present embodiment, a statistical model is explained as, forexample, a neural network or random forests. However, another type ofalgorithm may be applied.

Now, this specification explains the image processing device 3 accordingto an example of a modification of the present embodiment. In thefollowing explanation, the same portions as the above drawings used inthe explanation of the present embodiment are denoted by like referencenumbers, detailed description thereof being omitted. Portions differentfrom those of the present embodiment are mainly explained.

This specification explains the outline of the modification withreference to FIG. 20. As shown in FIG. 20, in the present modification,when the statistical model estimates the distance 504 from informationrelated to the captured image 501, the statistical model calculates theuncertainty 701 of the estimation for each pixel and outputs theuncertainty 701 together with the distance 504. The calculation methodof the uncertainty 701 is not limited to a specific method. Variousknown methods can be applied.

In the present modification, the distance acquisition module 33 examinesthe uncertainty output from the statistical model. When the uncertaintyis greater than or equal to a threshold, for example, the distanceacquisition module 33 discards the obtained distance information (inother words, the distance information indicating the distance in whichthe uncertainty is greater than or equal to the threshold). Distanceinformation is output such that the distance information is arranged ata position corresponding to the pixel in which the distance indicated bythe distance information is estimated (in other words, in a map form).When distance information is discarded, for example, a value indicatingthat the distance (distance information) estimated by the statisticalmodel is invalid is arranged at a position corresponding to the pixel inwhich the distance is estimated.

When the uncertainty for the distance estimated for a specific pixel isgreater than or equal to the threshold, the distance acquisition module33 is capable of correcting the distance, using the distance estimatedfor pixels around the specific pixel (in other words, the distance inwhich the uncertainty is less than the threshold). In this correction,for example, the mean value of the distance estimated for surroundingpixels may be the correction value. The correction value may bedetermined by majority decision by the distance.

FIG. 21 shows an example of the learning method of the statistical modelin the present modification. As shown in FIG. 21, in the presentmodification in which the statistical model outputs the uncertainty,basically, information related to the learning image 601 is input to thestatistical model, and the difference between the distance 602 estimatedby the statistical model and the correct value 603 is fed back to thestatistical model. However, in the statistical model to which theinformation related to the learning image 601 is input, as describedabove, the uncertainty 702 for the estimated distance 602 is calculated.Thus, in the present modification, a difference obtained by dividing thedifference between the distance 602 and the correct value 603 by thesquare of the uncertainty 702 is fed back. In this case, when theuncertainty 702 is infinite, the difference is zero. Thus, the square ofthe uncertainty 702 is added to the difference as a penalty.

In the present modification, the parameter (for example, the weightcoefficient) of the statistical model is updated to decrease the valueobtained by correcting the difference between the distance 602 and thecorrect value 603 with the uncertainty 702.

For example, when the uncertainty 702 is high while there is nodifference between the distance 602 estimated by the statistical modeland the correct value 603, the distance 602 is presumed to be estimatedby accident. In this case, it is possible to recognize that the learningof the distance 602 (correct value 603) is insufficient.

In the present modification, this deviation in learning can be reducedby using the uncertainty calculated by the statistical model.

Now, this specification explains the operation of the image processingdevice 3 of the present modification. The process for generating thestatistical model used in the image processing device 3 of the presentmodification is the same as the process shown in the above FIG. 18except that the difference corrected with the uncertainty is used asdescribed above. Thus, the detailed explanation thereof is omitted.

With reference to the flowchart shown in FIG. 22, this specificationexplains the processing procedure of the image processing device 3 whendistance information is obtained from a captured image.

The processes of steps S21 and S22 equivalent to the processes of stepsS11 and S12 shown in the above FIG. 19 are performed.

In the present modification, when the process of step S22 is performed,the statistical model estimates the distance to a subject and calculatesthe uncertainty for the distance. The distance to a subject and theuncertainty are output from the statistical model for each of the pixelsincluded in the captured image.

Accordingly, the distance acquisition module 33 obtains the distanceinformation indicating the distance and the uncertainty output from thestatistical model for each of the pixels included in the captured image(step S23).

Subsequently, the processes of steps S24 and S25 are performed for eachdistance information item obtained in step S23 (in other words, distanceinformation for each pixel). In the following explanation, the distanceinformation to be processed in steps S24 and S25 is referred to as thetarget distance information. The uncertainty for the distance indicatedby the target distance information is referred to as the targetuncertainty. Further, the pixels included in the captured image in whichthe distance indicated by the target distance information is estimated(output) in the statistical model are referred to as the target pixels.

In this case, the distance acquisition module 33 determines whether ornot the target uncertainty is greater than or equal to a threshold (stepS24).

When the distance acquisition module 33 determines that the targetuncertainty is greater than or equal to the threshold (YES in step S24),the distance acquisition module 33 specifies, of the distanceinformation for the pixels obtained in step S23, distance informationindicating the distance estimated for the pixels located around thetarget pixels in the captured image (hereinafter, referred to as thesurrounding pixels) in which the uncertainty for the indicated distanceis less than the threshold. Here, either a plurality of distanceinformation items or a single distance information item may bespecified. The distance acquisition module 33 corrects the distanceindicated by the target distance information, using the distanceindicated by the specified distance information (step S25). Whendistance information in which the uncertainty is less than the thresholdis not present in the distance information indicating the distanceestimated for the surrounding pixels, the distance indicated by thetarget distance information is set to, for example, the indefinite valuedetermined in advance.

When a plurality of distance information items are specified, thedistance indicated by the target distance information may be correctedso as to be the mean value of the distance indicated by the distanceinformation items (in other words, the distance estimated for thesurrounding pixels), or may be corrected based on majority decision bythe distance indicated by the distance information items. When a singledistance information item is specified, the distance indicated by thetarget distance information should be corrected based on the distanceindicated by the distance information item.

When the distance acquisition module 33 determines that the targetuncertainty is not greater than or equal to the threshold (in otherwords, the target uncertainty is less than the threshold) (NO in stepS24), the process of step S25 is not performed.

Subsequently, whether or not the processes of the above steps S24 andS25 are performed for all the distance information obtained in step S23is determined (step S26).

When it is determined that the processes are not performed for all thedistance information (NO in step S26), the procedure returns to step S24to repeat the processes. In this case, the processes are performed suchthat the distance information to which the process of step S24 or S25 isnot applied is the target distance information.

When it is determined that the processes are performed for all thedistance information (YES in step S26), the process of step S27equivalent to the process of step S14 shown in the above FIG. 19 isperformed.

In the example shown in FIG. 22, this specification explains that thedistance in which the uncertainty is greater than or equal to thethreshold is corrected with the distance estimated for the surroundingpixels. However, the distance information indicating the distance inwhich the uncertainty is greater than or equal to the threshold may bediscarded and may not be output by the output module 34.

As described above, in the present modification, by using theuncertainty calculated from a statistical model, it is possible toprevent the direct use of the distance in which the uncertainty isgreater than or equal to a threshold (in other words, the distance whichis presumed to be estimated incorrectly as the uncertainty is high).

APPLICATION EXAMPLES

Now, this specification explains application examples to which theranging system 1 having the structures of the above embodiment andmodification is applied. Here, for convenience sake, this specificationexplains a case where the ranging system 1 is realized as a singledevice (ranging device) including a capture unit equivalent to thecapture device 2 shown in FIG. 1 and an image processor equivalent tothe image processing device 3. In the following drawings, thisspecification assumes that the ranging device 1 includes the captureunit 2 and the image processor 3.

FIG. 23 shows an example of the functional configuration of a mobileobject 800 into which the ranging device 1 is incorporated. The mobileobject 800 can be realized as, for example, an automobile, an unmannedaerial vehicle or an autonomous mobile robot including an automateddriving function. The unmanned aerial vehicle is an airplane,rotorcraft, glider or airship which cannot carry people. The unmannedaerial vehicle can be flown by remote control or automatic driving, andincludes, for example, a drone (multicopter), a radio control device anda helicopter for pesticide spraying. The autonomous mobile robotincludes a mobile robot such as an automated guided vehicle (AGV), acleaning robot for cleaning floors, a communication robot which guidesvisitors in various ways, etc. The mobile object 800 is not limited to arobot in which the main body moves. The mobile object 800 includes anindustrial robot including a drive mechanism which moves or rotates partof the robot, such as a robotic arm.

As shown in FIG. 23, the mobile object 800 includes, for example, theranging device 1, a control signal generator 801 and a drive mechanism802. The capture unit 2 provided in the ranging device 1 is set so as tocapture an image of a subject in the travel direction of the mobileobject 800 or part of the mobile object 800.

As shown in FIG. 24, when the mobile object 800 is an automobile 800A,the capture unit (capture device) 2 is set as a front camera whichcaptures an image of the front side. The capture unit 2 may be set as arear camera which captures an image of the rear side when the automobileis backed. A plurality of capture units 2 may be set as a front cameraand a rear camera. Further, the capture unit 2 may also function as adashboard camera. Thus, the capture unit 2 may be a video recorder.

FIG. 25 shows an example when the mobile object 800 is a drone 800B. Thedrone 800B includes a drone main body 811 equivalent to the drivemechanism 802, and four propeller units 812 to 815. Each of thepropeller units 812 to 815 includes a propeller and a motor. When thedrive of each motor is transferred to a corresponding propeller, thepropellers are rotated. By the lift of the rotation, the drone 800 isflown. The capture unit 2 (ranging device 1) is mounted in, for example,the lower part of the drone main body 811.

FIG. 26 shows an example when the mobile object 800 is an autonomousmobile robot 800C. A power unit 821 equivalent to the drive mechanism802 and including a motor, a wheel, etc., is provided in the lower partof the mobile robot 800C. The power unit 821 controls the number ofrevolutions of the motor and the direction of the wheel. In the mobilerobot 800C, the wheel set on the road surface or floor surface isrotated when the drive of the motor is transferred. As the direction ofthe wheel is controlled, the mobile robot 800C is capable of moving inan arbitrary direction. In the example shown in FIG. 26, for example,the capture unit 2 is provided in the head of the mobile humanoid robot800C so as to capture an image of the front side of the mobile robot800C. The capture unit 2 may be provided so as to capture an image ofthe rear side or left and right sides of the mobile robot 800C. Aplurality of capture units 2 may be provided so as to capture an imageof a plurality of directions. The capture unit 2 may be provided in acompact robot which has a small space for mounting a sensor, etc., toestimate the self-position, the posture and the position of a subjectand conduct dead reckoning.

When the mobile object 800 is a robotic arm 800D as shown in FIG. 27,and the move and rotation of part of the robotic arm 800D arecontrolled, the capture unit 2 may be provided at, for example, theleading end of the robotic arm 800D. In this case, the capture unit 2captures an image of the object to be held by the robotic arm 800D. Theimage processor 3 is capable of estimating the distance to the object tobe held by the robotic arm 800D. This structure enables the robotic arm800D to accurately hold the object.

The control signal generator 801 outputs a control signal forcontrolling the drive mechanism 802 based on the distance informationoutput from the ranging device 1 (image processor 3) and indicating thedistance to a subject. The drive mechanism 802 drives the mobile object800 or part of the mobile object 800 by the control signal output fromthe control signal generator 801. For example, the drive mechanism 802performs at least one of the move, the rotation, the acceleration, thedeceleration, the adjustment of thrust (lift), the change in the traveldirection, the switching between a normal operation mode and anautomatic operation mode (crash avoidance mode) and the operation of asafety device such as an air-bag of the mobile object 800 or part of themobile object 800. For example, when the distance to a subject is lessthan a threshold, the drive mechanism 802 may perform at least one ofthe move, the rotation, the acceleration, the adjustment of thrust(lift), the change to a direction approaching the object, and theswitching from an automatic operation mode (crash avoidance mode) to anormal operation mode.

The drive mechanism 802 of the automobile 800A shown in FIG. 24 is, forexample, a tire. The drive mechanism 802 of the drone 800B shown in FIG.25 is, for example, a propeller. The drive mechanism 802 of the mobilerobot 800C shown in FIG. 26 is, for example, a leg portion. The drivemechanism 802 of the robotic arm 800D shown in FIG. 27 is, for example,a supporting portion which supports the leading end at which the captureunit 2 is provided.

The mobile object 800 may further include a speaker or display to whichthe information (distance information) output from the ranging device 1and related to the distance to a subject is input. The speaker and thedisplay are connected to the ranging device 1 with or without a line andare configured to output sound or an image related to the distance to asubject. Moreover, the mobile object 800 may include a light emittingunit. To the light emitting unit, the information output from theranging device 1 and related to the distance to a subject may be input.The light emitting unit may be configured to be switched on and off inaccordance with the distance to a subject.

When the mobile object 800 is the drone 800B, the drone 800B obtains animage of a subject captured by the capture unit 2 and determines whetheror not the distance to the subject is greater than or equal to athreshold at the time of the preparation of a map (the three-dimensionalshape of an object), the structural survey of a building or geography oran inspection of cracks and breaks of electric cables from the above.The control signal generator 801 generates a control signal forcontrolling the thrust of the drone 800B based on the result of thedetermination such that the distance to the target to be inspected isconstant. It is assumed that the thrust includes lift. As the drivemechanism 802 causes the drone 800B to operate based on the controlsignal, the drone 800B can be flown parallel to the target to beinspected. When the mobile object 800 is a drone for observation, acontrol signal for controlling the thrust of the drone such that thedistance to the object to be observed is constant may be generated.

When the mobile object 800 (for example, the drone 800B) is used for themaintenance and inspection of various infrastructures (hereinafter,simply referred to as an infrastructure), the distance to the portion tobe repaired in the infrastructure such as a crack portion or a rustedportion can be obtained by capturing an image of the portion to berepaired by the capture unit 2. In this case, the size of the portion tobe repaired can be calculated from the image by using the distance tothe portion to be repaired. In this structure, for example, when theportion to be repaired is displayed on a map showing the entireinfrastructure, the inspector of the infrastructure can identify theportion to be repaired. If the inspector is informed of the size of theportion to be repaired in advance, this is effective to smoothly conductmaintenance and repairs.

When the drone 800B flies, the drone 8008 obtains an image captured bythe capture device 2 in the ground direction, and determines whether ornot the distance to the ground is greater than or equal to a threshold.The control signal generator 801 generates a control signal forcontrolling the thrust of the drone 800B based on the result of thedetermination such that the height from the ground is the specifiedheight. As the drive mechanism 802 causes the drone 800B to operatebased on the control signal, the drone 800B can be flown at thespecified height. When the drone 800B is a drone for pesticide spraying,it is easy to evenly scatter pesticide by keeping the height of thedrone 800B from the ground constant.

When the mobile object 800 is the automobile 800A or the drone 800B, themobile object 800 obtains an image of an automobile on the front side oran adjacent drone captured by the capture unit 2 and determines whetheror not the distance to the automobile or drone is greater than or equalto a threshold at the time of the cooperative running of the automobile800A or the cooperative flying of the drone 800B. The control signalgenerator 801 generates a control signal for controlling the speed ofthe automobile 800A or the thrust of the drone 8008 based on the resultof the determination such that the distance to the automobile on thefront side or the adjacent drone is constant. As the drive mechanism 802causes the automobile 800A or the drone 800B to operate based on thecontrol signal, the cooperative running of the automobile 800A or thecooperative flying of the drone 808B can be easily performed.

Further, when the mobile object 800 is the automobile 800A, aninstruction from the driver of the automobile 800A may be received via auser interface such that the driver can set (change) a threshold. Inthis structure, the driver can drive the automobile 800A, keeping adesired distance from another automobile. To keep a safe distance fromthe automobile ahead, a threshold may be changed in accordance with thespeed of the automobile 800A. The safe distance from the automobileahead changes depending on the speed of the automobile 800A. Thus, thethreshold can be set so as to be greater (longer) as the automobile 800Ais driven faster.

When the mobile object 800 is the automobile 800A, a predetermineddistance in the travel direction may be set as the threshold. A controlsignal for causing brakes to operate or causing a safety device such asan air-bag to operate when an object emerges at a distance less than thethreshold may be generated. In this case, a safety device such as anautomatic braking device or an air-bag is provided in the drivemechanism 802.

According to at least one embodiment described above, it is possible toprovide an image processing device, a ranging device and a methodcapable of improving robustness when a distance is obtained from animage.

Each of the various functions described in the embodiment and themodification may be realized by a circuit (processing circuit). Forexample, the processing circuit includes a programmed processor such asa central processing unit (CPU). The processor executes each describedfunction by executing a computer program (a group of instructions)stored in a memory. The processor may be a microprocessor including anelectric circuit. For example, the processing circuit includes a digitalsignal processor (DSP), an application specific integrated circuit(ASIC), a microcontroller, a controller and other electric circuitcomponents. Each of the components described in the present embodimentother than the CPU may be also realized by a processing circuit.

Each process of the present embodiment can be realized by a computerprogram. Therefore, an effect similar to that of the present embodimentcan be easily realized by merely installing the computer program on acomputer through a computer-readable storage medium in which thecomputer program is stored and executing the computer program.

While certain embodiments have been described, these embodiments havebeen presented by way of example only, and are not intended to limit thescope of the inventions. Indeed, the novel embodiments described hereinmay be embodied in a variety of other forms; furthermore, variousomissions, substitutions and changes in the form of the embodimentsdescribed herein may be made without departing from the spirit of theinventions. The accompanying claims and their equivalents are intendedto cover such forms or modifications as would fall within the scope andspirit of the inventions.

What is claimed is:
 1. An image processing device comprising: storagewhich stores a statistical model generated by learning bokeh produced ina first image affected by aberration of an optical system, the bokehchanging nonlinearly in accordance with a distance to a subject in thefirst image, and a processor which obtains a second image affected bythe aberration of the optical system, and inputs the second image to thestatistical model and obtains distance information indicating a distanceto a subject in the second image.
 2. The image processing device ofclaim 1, wherein the bokeh changing nonlinearly in accordance with thedistance to the subject in the first image comprises bokeh produced bythe aberration of the optical system.
 3. The image processing device ofclaim 1, wherein the bokeh changing nonlinearly in accordance with thedistance to the subject in the first image comprises bokeh produced inaccordance with a size or shape of an aperture of a diaphragm mechanismadjusting an amount of light taken into the optical system.
 4. The imageprocessing device of claim 1, wherein the statistical model is generatedby learning only the bokeh changing nonlinearly in accordance with thedistance to the subject in the first image.
 5. The image processingdevice of claim 1, wherein the statistical model is generated bylearning the bokeh changing nonlinearly in accordance with the distanceto the subject in each of local areas in the first image.
 6. The imageprocessing device of claim 1, wherein the statistical model comprises aneural network or random forests.
 7. The image processing device ofclaim 1, wherein the statistical model extracts bokeh changing inaccordance with a distance from the second image, and estimates adistance corresponding to the bokeh, and the processor obtains distanceinformation indicating the estimated distance.
 8. The image processingdevice of claim 5, wherein the processor extracts a local area from thesecond image, inputs information related to the extracted local area tothe statistical model, and obtains distance information indicating adistance to a subject in the local area.
 9. The image processing deviceof claim 8, wherein the information input to the statistical model andrelated to the local area comprises information indicating a differenceof a pixel value between each of pixels included in the local area andits adjacent pixel.
 10. The image processing device of claim 8, whereinthe information input to the statistical model and related to the localarea comprises location information of the local area in the secondimage.
 11. The image processing device of claim 10, wherein the locationinformation comprises information indicating coordinates of a centerpoint of the local area on the second image.
 12. The image processingdevice of claim 1, wherein the statistical model estimates a distance tothe subject for each of a plurality of pixels included in the secondimage, and calculates uncertainty for the estimated distance, and theprocessor obtains distance information indicating the estimated distanceand the calculated uncertainty.
 13. The image processing device of claim12, wherein the processor discards, of the obtained distanceinformation, distance information indicating a distance in which thecalculated uncertainty is greater than or equal to a threshold.
 14. Theimage processing device of claim 12, wherein the processor corrects afirst distance indicated by the obtained distance information, using asecond distance estimated for a pixel located around a pixel in whichthe first distance is estimated, uncertainty for the first distance isgreater than or equal to a threshold, and uncertainty for the seconddistance is less than the threshold.
 15. A ranging device comprising: acapture unit which captures an image, storage which stores a statisticalmodel generated by learning bokeh produced in a first image affected byaberration of an optical system of the capture unit, the bokeh changingnonlinearly in accordance with a distance to a subject in the firstimage, and a processor which obtains a second image affected by theaberration of the optical system of the capture unit from the captureunit; and inputs the second image to the statistical model, and obtainsdistance information indicating a distance to a subject in the secondimage.
 16. A method comprising: storing, in storage, a statistical modelgenerated by learning bokeh produced in a first image affected byaberration of an optical system, the bokeh changing nonlinearly inaccordance with a distance to a subject in the first image; obtaining asecond image affected by the aberration of the optical system; andinputting the second image to the statistical model and obtainingdistance information indicating a distance to a subject in the secondimage.